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Theoretical Properties (Continued) 





Point clouds are sets of points that arise in many situations such as LIDAR 
data and samplings of 3D meshes. Traditional neural networks struggle 
with point clouds because their output depends on the order of the input, 
and sets with n points have n! permutations to worry about. PointNet [1] 
solves this problem directly by being permutation invariant by design. 





Feed-Forward Neural Networks 
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Above is a 2-hidden-layer neural network with activation function ø. In 
general, a feed-forward neural network applies an alternating sequence 
of linear transformations L; and nonlinear transformations g; to an input x: 


f(x) = on(Ln(. . . o2(L2(01(L(x))))...)) 


PointNet 
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Above, f and g are traditional neural networks and Fpy is a PointNet. By 
taking the component-wise maximum (a.k.a. max-pooling) of transformed 
elements of the set, PointNet achieves permutation invariance. 


Theoretical Properties of PointNet 


e PointNet is continuous with respect to the Hausdorff distance. 
The Hausdorff distance dy is a metric that measures how different 
non-empty compact subsets are. For a metric space (Q, d), it’s given 
by: 
di(X vi) lier OF XEY andYCX} 


where X€ is e-fattening of X, i.e., the set of all points within € of X. 


This can be visualized in (IR2, deuctidean) as: 
x Y 


dy (X,Y) 


e Universal Approximation. For any € > O and uniformly d,-continuous 
function F, there’s a PointNet Fpy so that |F(A) — Fpy(A)| < € for all 
point clouds A in the unit ball. 





Training in General 


Given some data D = 4 (x;, vit & parametric model f,, with weights w. 


e Loss Function / Objective Function: For the machine to know how far 
predictions are from actual observations we use a loss function (y, Y), 
where y is the predicted value and y is the ground-truth. 

e Gradient Descent (GD): For a given loss function, GD is an optimization 
algorithm with the goal of minw 1 DE (yi, fw(x;)). GD updates the 
weights by wit! = wi — aVy~ i (yi, fw(%))) where a > O is the 
so-called learning rate. 

e Stochastic Gradient Descent (SGD): Similar to GD except the 
whole-data gradient is estimated using gradients over smaller batches 
of data. SGD has a reduced computation time per iteration and smaller 
memory cost than GD. Many epochs (i.e. number of times the 
algorithm sees the whole dataset) may be needed. 

e Backpropagation: An algorithm which efficiently computes the 
gradient V,,(-) with respect to network weights via careful use of the 
chain rule, iterating backwards from the final layer to ‘unpack’ one layer 
at a time, thus avoiding calculating anything twice. [2] 


Training a Classifier 


., Xm) € R” into a 
probability distribution by s (x); = no Classification models output 


e Softmax: A function for transforming x = (%,.. 


the class with the largest associated probability, e.g. 77% toilet. 

e Information Entropy: Given by H(p) = — )_;p;!0g p; for a probability 
distribution p. It’s deeply related to data compression. 

e Cross-Entropy Loss: The distance, in bits of information, separating 
two probability distributions. Cross entropy loss can be mathematically 
defined as H(y, Y) = — )_, y;log, (yi) , where y and y are as defined 
above. Often used in classification problems, including our experiment. 





Sampling Meshes 





For this project we use the ModelNet10 data set, 3D meshes broken into 
10 classes (bed, chair, desk, etc.). We use Barycentric coordinates to 
sample 1,024 random points from surface of the mesh. These form the 
point clouds our model takes as input. Some meshes are sampled 
multiple times to ensure a balanced data set. After sampling the 3D 
meshes, we have 1,000 point clouds per class. 


Our Results 


Training Curve for PointNet 
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Class: 


_Accuracy(%): | 52 | 94 | 94 | 78 | 81 | 98 | 


We implement PointNet with PyTorch. We tried SGD but found the Adam 
optimizer better for training and used it instead. We train using: Epochs 
= 100, Batch Size = 32, Learning Rate = 107°. We chose the 
sub-networks of PointNet to be of type f : R? — R? — R* and 

g: R? — R2 — R1? with Softmax at the end. Our model achieves an 
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accuracy of 80.2% on ModelNet10 (random guessing would give 10%). 
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